library(haven) # For reading Stata data files
library(dplyr) # For data manipulation
library(here) # For building file paths
Exploring the Penn World Tables with R
Introduction
Welcome to an exploration of the Penn World Tables (PWT) using R! In this tutorial, you’ll learn how to read, manipulate, and analyze PWT data.
Prerequisites
Before diving into the analysis, let’s load the necessary R packages. These packages will help us read data files, manipulate data efficiently, and manage file paths easily.
Don’t worry about messages regarding function masking; they are typical when multiple packages have similar functions.
Reading the Data
Let’s kick things off by reading the PWT dataset into R. We’ll use the here()
function to ensure the path is relative to your project directory.
<- read_dta(here("databases/pwt100.dta")) penn
Take a peek at the dataset to understand its structure and content. We take the top, and then show the first seven columns:
head(penn)[1:7]
# A tibble: 6 × 7
countrycode country currency_unit year rgdpe rgdpo pop
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ABW Aruba Aruban Guilder 1950 NA NA NA
2 ABW Aruba Aruban Guilder 1951 NA NA NA
3 ABW Aruba Aruban Guilder 1952 NA NA NA
4 ABW Aruba Aruban Guilder 1953 NA NA NA
5 ABW Aruba Aruban Guilder 1954 NA NA NA
6 ABW Aruba Aruban Guilder 1955 NA NA NA
Let’s also view the last few rows to get a sense of the data’s scope:
tail(penn)[1:7]
# A tibble: 6 × 7
countrycode country currency_unit year rgdpe rgdpo pop
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 ZWE Zimbabwe US Dollar 2014 37861. 38675. 13.6
2 ZWE Zimbabwe US Dollar 2015 40142. 39799. 13.8
3 ZWE Zimbabwe US Dollar 2016 41875. 40963. 14.0
4 ZWE Zimbabwe US Dollar 2017 44672. 44317. 14.2
5 ZWE Zimbabwe US Dollar 2018 44325. 43421. 14.4
6 ZWE Zimbabwe US Dollar 2019 42296. 40827. 14.6
Each column has a variable. Each row shows all values of all variables for a specific country for a specific year.
Analyzing GDP Per Capita
We’ll now focus on analyzing GDP per capita over time for different countries. First, let’s select the necessary variables:
<- penn %>% select(countrycode, year, rgdpna, pop)
temp head(temp)
# A tibble: 6 × 4
countrycode year rgdpna pop
<chr> <dbl> <dbl> <dbl>
1 ABW 1950 NA NA
2 ABW 1951 NA NA
3 ABW 1952 NA NA
4 ABW 1953 NA NA
5 ABW 1954 NA NA
6 ABW 1955 NA NA
Next, we calculate GDP per capita (ypop) by dividing GDP (rgdpna
) by the population (pop
):
$ypop <- temp$rgdpna / temp$pop temp
Check out the result of this calculation:
tail(temp)
# A tibble: 6 × 5
countrycode year rgdpna pop ypop
<chr> <dbl> <dbl> <dbl> <dbl>
1 ZWE 2014 41274. 13.6 3038.
2 ZWE 2015 42008. 13.8 3041.
3 ZWE 2016 42326. 14.0 3017.
4 ZWE 2017 44317. 14.2 3113.
5 ZWE 2018 46457. 14.4 3218.
6 ZWE 2019 42694. 14.6 2915.
Plotting GDP Per Capita for the USA
Visualizing data makes it more comprehensible. Let’s create a plot showing the GDP per capita for the USA over time:
<- temp %>% filter(countrycode == "USA") temp_usa
plot(temp_usa$year, temp_usa$ypop /1000, main = "GDP Per Capita in the USA",
xlab = "", ylab = "Thousands of US dollars",
type = "l", col = "blue", lwd = 2, las=1)
Calculating Growth Rates
To understand economic dynamics, calculating growth rates is crucial. We’ll first demonstrate an incorrect calculation to highlight common pitfalls:
<- temp %>%
temp_example filter(countrycode %in% c("USA", "COL"), year %in% 2006:2009) %>%
select(countrycode, year, ypop)
$growth <- 100 * (temp_example$ypop - lag(temp_example$ypop)) / lag(temp_example$ypop)
temp_example temp_example
# A tibble: 8 × 4
countrycode year ypop growth
<chr> <dbl> <dbl> <dbl>
1 COL 2006 10031. NA
2 COL 2007 10575. 5.43
3 COL 2008 10795. 2.08
4 COL 2009 10797. 0.0207
5 USA 2006 55484. 414.
6 USA 2007 55989. 0.910
7 USA 2008 55382. -1.08
8 USA 2009 53480. -3.43
Note that the growth rate for the USA for 2006 is not correct. This method is flawed because it doesn’t account for changes between countries. Here’s the correct approach:
<- temp_example %>%
temp_corrected group_by(countrycode) %>%
mutate(growth = 100 * (ypop - lag(ypop)) / lag(ypop))
temp_corrected
# A tibble: 8 × 4
# Groups: countrycode [2]
countrycode year ypop growth
<chr> <dbl> <dbl> <dbl>
1 COL 2006 10031. NA
2 COL 2007 10575. 5.43
3 COL 2008 10795. 2.08
4 COL 2009 10797. 0.0207
5 USA 2006 55484. NA
6 USA 2007 55989. 0.910
7 USA 2008 55382. -1.08
8 USA 2009 53480. -3.43
By grouping the data by country, we ensure accurate growth rate calculations.
Conclusion
In this guide, you’ve learned how to work with the Penn World Tables in R. From reading and manipulating data to calculating GDP per capita and growth rates, these techniques are fundamental for economic data analysis.